# Coal Mine Seismicity P Phase Arrival Dataset.

This dataset contains waveforms and analyst-determined P-phase arrival times of coal mine-related seismicity recorded by various seismic networks of different scales.
See Johnson et al., 2020 for more details.

There are two data files:

1. data.parqet

Contains preprocessed waveforms sliced into 600 sample segments. The data are saved in the [parquet format](200~https://parquet.apache.org/) and can be read using Pandas' [read_parquet](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_parquet.html) function.

```python
import pandas as pd

df = pd.read_parquet('data.parquet')

# get waveforms (raw waveforms)
waveforms = df['data']

# get stats (contains dataset name, pick sample determined by analyst, and training flag)
stats = df['stats']

# filter df to only include traces used for training
df_training = df[df[('stats', 'used_for_training')]]
```

2. raw_waveforms.hdf5

Contains unprocessed waveforms for all datasets. Each record contains 5 seconds of data before the P pick and 5 seconds after with NaN used to signify missing values. The [pytables](https://www.pytables.org/) library was used to compress and store the data in the HDF5 format. The structure of the data is {dataset}/{sampling_rate}/data for waveforms and {dataset}/{sampling_rate}/picks for pick samples. Python can be used to read the waveforms and pick samples from dataset C sampled at 1000 Hz in the following example:

```python
import numpy as np
import tables as tb

# read waveforms sampled at 1000Hz for dataset C and corresponding analyst picks
with tb.open_file("raw_waveforms.hdf5") as fi:
    waveforms = np.array(fi.root['C']['1000']['data']
    picks = np.array(fi.root['C']['1000']['picks']
```

Make sure to account for possible NaN in these arrays!

## References

Johnson, S. W., Chambers, D. J., Boltz M. S. Koper, K. D. (2020). Detecting and characterizing coal mine related seismicity in the Western US using subspace methods. Application of a Convolutional Neural Network for Seismic Phase Picking of Mining-induced Seismicity [in review].
